Representation of Document Archives for Interactive Exploration
نویسندگان
چکیده
Today's information age may be characterized by constant massive production and dissemination of written information. More powerful tools for exploring, searching, and organizing the available mass of information are needed to cope with this situation. In this context the map metaphor for displaying the contents of a document archive in a two-dimensional display has gained increased interest. In particular, we rely on self-organizing maps, which produce a map of the document space after their training process. From geography, however, it is known that maps are not always the best way to represent information spaces. For most applications it is better to provide a hierarchical view of the underlying data collection in form of an atlas, where, starting from a map representing the complete data collection, di erent regions are shown at ner levels of granularity. Using an atlas, the user can easily \zoom" into regions of particular interest while still having general maps for overall orientation. We show that a similar display can be obtained by using the Growing Hierarchical Self-Organizing Map (GHSOM) to represent the contents of a document archive. This neural network model has an adaptive layered architecture where each layer consists of a number of individual self-organizing maps. By this, the contents of the text archive may be represented at arbitrary detail while still having the general maps available for global orientation.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملمطالعه تطبیقی نمایندگی در امضای اسناد تجاری (برات، سفته و چک)
According to the Article 227 of Commerce Law and Article 19 drawing cheque in respective with appointing a representation for issuance of draft and cheque, the following questions have always been present: a) whether this representation exists only at the time of signing a document or it will be present at other stages such as endorsement and assurance, too? b) Whether the responsibility of s...
متن کاملA Metaphor Graphics Based Representation of Digital Libraries on the World Wide Web: Using the libViewer to Make Metadata Visible
While methods for searching large digital libraries have experienced tremendous improvements recently, interfaces to such collections still have a far way to go. Most interfaces to digital libraries present themselves as various forms of sorted lists, providing metadata information on the documents in textual form. This prohibits intuitive understanding of document archives or web search result...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملThe Representation of Social Actors in the Graduate Employability Issue: Online News and the Government Document
This paper presents the first part of a larger study on the issue of graduate employability in Malaysia as construed in public discourse in English, a language of power in Malaysia. The term employability itself has many definitions depending on the requirements of government and industry, and in the case of Malaysia, the English-language ability of graduates is inseparable from graduate employ...
متن کامل